Lab Week 5 Objectives:

Required packages:

  1. Attach packages
library(tidyverse)
## ── Attaching packages ──────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  2.0.1     ✔ dplyr   0.7.8
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## Warning: package 'tibble' was built under R version 3.5.2
## ── Conflicts ─────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(tseries)
library(forecast)
## Warning: package 'forecast' was built under R version 3.5.2
  1. Get data
energy <- read_csv("energy.csv")
## Parsed with column specification:
## cols(
##   month = col_character(),
##   res_total = col_double(),
##   ind_total = col_double()
## )
  1. Convert to ts data
res_ts <- ts(energy$res_total, frequency = 12, start = c(1973,1))
# res_ts

Plot those…

plot(res_ts)

For each, we should ask ourselves: - Is there a trend? - Do data look additive or multiplicative? - Is there seasonality? - Are there notable outliers?

  1. Decompose to start exploring the data further
res_dc <- decompose(res_ts)
plot(res_dc)

  1. Other ways to explore the data…
# Changes within each month over years recorded:

monthplot(res_ts)

ggseasonplot(res_ts) +
  theme_bw()

  1. Simple moving average to smooth (changing the averaging window)

Using ma function in forecast package

# Have them see what happens when they change the moving window...

sma_res <- ma(res_ts, order = 5)
plot(res_ts)
lines(sma_res, col = "red")

  1. Exploring autocorrelation (ACF) - two ways
# Basic way:
res_acf <- acf(res_ts)

# More information: 
ggtsdisplay(res_ts)

Not surprising: strong seasonality is dominant. There appears to be some trend. It looks relatively additive. Can we test for stationarity?

  1. Augmented Dickey-Fuller test for stationarity

Hypothesis test: null is that the data are NOT stationary. If p < 0.05, we reject the null hypothesis and retain the alternative hypothesis that the data ARE stationary.

adf_res <- adf.test(res_ts) # Yes, stationary
## Warning in adf.test(res_ts): p-value smaller than printed p-value
adf_res # p-value = 0.01
## 
##  Augmented Dickey-Fuller Test
## 
## data:  res_ts
## Dickey-Fuller = -11.342, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
  1. Holt Winters exponential smoothing
# Exponential smoothing: no normality assumption (unbiased)

# Perform Holt Winters
res_hw <- HoltWinters(res_ts) # See smoothing parameters with res_hw
plot(res_hw)

# Then forecast
res_forecast <- forecast(res_hw, h = 60)
plot(res_forecast)

Then check the residuals:

hist(res_forecast$residuals) # Look normally distributed.

  1. Autoregressive integrated moving average (ARIMA) using auto.arima for p, d, q
res_pdq <- auto.arima(res_ts) # [1,0,1][0,1,1]
res_pdq
## Series: res_ts 
## ARIMA(1,0,2)(0,1,1)[12] with drift 
## 
## Coefficients:
##         ar1      ma1      ma2     sma1   drift
##       0.990  -0.4821  -0.4196  -0.8128  0.7520
## s.e.  0.011   0.0407   0.0397   0.0253  0.5872
## 
## sigma^2 estimated as 7326:  log likelihood=-3090.52
## AIC=6193.05   AICc=6193.21   BIC=6218.64
res_arima <- arima(res_ts, order = c(1,0,1), seasonal = list(order = c(0,1,1)))
par(mfrow = c(1,2))
hist(res_arima$residuals)
qqnorm(res_arima$residuals)

forecast_res <- forecast(res_arima, h = 72)
plot(forecast_res)

res_df <- data.frame(forecast_res)
month_seq <- seq(1,72)

res_df_2 <- data.frame(month_seq, res_df) # View this data frame...
 
ggplot(res_df_2, aes(x = month_seq, y = Point.Forecast)) + 
  geom_line() +
  geom_ribbon(aes(ymin = Lo.95, ymax = Hi.95, alpha = 0.2)) +
  theme_minimal()